AITopics

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Netherlands (0.04)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsDec-25-2025, 17:07:24 GMT

Selecting Optimal Decisions via Distributionally Robust Nearest-Neighbor Regression

This paper develops a prediction-based prescriptive model for optimal decision making that (i) predicts the outcome under each action using a robust nonlinear model, and (ii) adopts a randomized prescriptive policy determined by the predicted outcomes. The predictive model combines a new regularized regression technique, which was developed using Distributionally Robust Optimization (DRO) with an ambiguity set constructed from the Wasserstein metric, with the K-Nearest Neighbors (K-NN) regression, which helps to capture the nonlinearity embedded in the data. We show theoretical results that guarantee the out-of-sample performance of the predictive model, and prove the optimality of the randomized policy in terms of the expected true future outcome. We demonstrate the proposed methodology on a hypertension dataset, showing that our prescribed treatment leads to a larger reduction in the systolic blood pressure compared to a series of alternatives. A clinically meaningful threshold level used to activate the randomized policy is also derived under a sub-Gaussian assumption on the predicted outcome.

distributionally robust nearest-neighbor regression, name change, selecting optimal decision, (3 more...)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.61)

Ruidi Chen, Ioannis Paschalidis

Selecting Optimal Decisions via Distributionally Robust Nearest-Neighbor Regression

Neural Information Processing SystemsOct-3-2025, 05:08:56 GMT

Neighbors (K-NN) regression, which helps to capture the nonlinearity embedded in the data.

artificial intelligence, future outcome, machine learning, (19 more...)

Country: North America (0.28)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Neural Information Processing SystemsOct-2-2025, 22:43:41 GMT

52f4691a4de70b3c441bca6c546979d9-Paper.pdf

blackwell winner, criteria, von neumann winner, (13 more...)

Country:

North America > United States > Michigan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

arXiv.org Artificial IntelligenceAug-15-2025

Competitive Algorithms for Multi-Agent Ski-Rental Problems

Wang, Xuchuang, Sun, Bo, Beyhaghi, Hedyeh, Lui, John C. S., Hajiesmaili, Mohammad, Wierman, Adam

This paper introduces a novel multi-agent ski-rental problem that generalizes the classical ski-rental dilemma to a group setting where agents incur individual and shared costs. In our model, each agent can either rent at a fixed daily cost, or purchase a pass at an individual cost, with an additional third option of a discounted group pass available to all. We consider scenarios in which agents' active days differ, leading to dynamic states as agents drop out of the decision process. To address this problem from different perspectives, we define three distinct competitive ratios: overall, state-dependent, and individual rational. For each objective, we design and analyze optimal deterministic and randomized policies. Our deterministic policies employ state-aware threshold functions that adapt to the dynamic states, while our randomized policies sample and resample thresholds from tailored state-aware distributions. The analysis reveals that symmetric policies, in which all agents use the same threshold, outperform asymmetric ones. Our results provide competitive ratio upper and lower bounds and extend classical ski-rental insights to multi-agent settings, highlighting both theoretical and practical implications for group decision-making under uncertainty.

agent, artificial intelligence, competitive ratio, (15 more...)

2507.15727

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (0.67)
Transportation > Electric Vehicle (0.67)
Automobiles & Trucks (0.67)
Energy > Renewable > Solar (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

Neural Information Processing SystemsOct-10-2024, 12:10:41 GMT

Selecting Optimal Decisions via Distributionally Robust Nearest-Neighbor Regression

distributionally robust nearest-neighbor regression, randomized policy, selecting optimal decision, (1 more...)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.65)

Jafarnia-Jahromi, Mehdi, Jain, Rahul, Nayyar, Ashutosh

A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent

arXiv.org Artificial IntelligenceMar-11-2024

In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $O(\sqrt[3]{DS^2AT^2})$ by Wei et al. (2017) under the same assumption and matches the theoretical lower bound in $T$.

agent, algorithm, opponent, (12 more...)

2109.03396

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games (0.46)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)

arXiv.org Artificial IntelligenceOct-8-2023

The Reinforce Policy Gradient Algorithm Revisited

Bhatnagar, Shalabh

We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm. We estimate the policy gradient using a function measurement over a perturbed parameter by appealing to a class of random search approaches. This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm. Nonetheless, we observe that even though we estimate the gradient of the performance objective using the performance objective itself (and not via the sample gradient), the algorithm converges to a neighborhood of a local minimum. We also provide a proof of convergence for this new algorithm.

algorithm, denote, gradient, (15 more...)

2310.05

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Huang, Jiatai, Golubchik, Leana, Huang, Longbo

Queue Scheduling with Adversarial Bandit Learning

arXiv.org Artificial IntelligenceMar-3-2023

In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions. We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates. Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization, without knowledge of the instantaneous network state (the arrival and service rates) of each queue. We then present two novel algorithms \texttt{SoftMW} (SoftMaxWeight) and \texttt{SSMW} (Sliding-window SoftMaxWeight), both capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies whose time-variation satisfies a mild condition. We further generalize our results to the setting where arrivals and departures only have bounded moments instead of being deterministically bounded and propose \texttt{SoftMW+} and \texttt{SSMW+} that are capable of stabilizing the system. As a building block of our new algorithms, we also extend the classical \texttt{EXP3.S} (Auer et al., 2002) algorithm for multi-armed bandits to handle unboundedly large feedback signals, which can be of independent interest.

artificial intelligence, data mining, machine learning, (16 more...)

2303.01745

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Power Industry (0.67)
Transportation > Infrastructure & Services (0.67)
Information Technology (0.67)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Chen, Ruidi, Paschalidis, Ioannis Ch.

Distributionally Robust Learning

arXiv.org Machine LearningAug-20-2021

This monograph develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data using Distributionally Robust Optimization (DRO) under the Wasserstein metric. Beginning with fundamental properties of the Wasserstein metric and the DRO formulation, we explore duality to arrive at tractable formulations and develop finite-sample, as well as asymptotic, performance guarantees. We consider a series of learning problems, including (i) distributionally robust linear regression; (ii) distributionally robust regression with group structure in the predictors; (iii) distributionally robust multi-output regression and multiclass classification, (iv) optimal decision making that combines distributionally robust regression with nearest-neighbor estimation; (v) distributionally robust semi-supervised learning, and (vi) distributionally robust reinforcement learning. A tractable DRO relaxation for each problem is being derived, establishing a connection between robustness and regularization, and obtaining bounds on the prediction and estimation errors of the solution. Beyond theory, we include numerical experiments and case studies using synthetic and real data. The real data experiments are all associated with various health informatics problems, an application area which provided the initial impetus for this work.

distributionally robust grouped variable selection, generalization performance, robustness and regularization, (15 more...)

arXiv.org Machine Learning

2108.08993

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.92)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
(3 more...)